The Role of the Trainer in Reinforcement Learning
نویسندگان
چکیده
In this paper we propose a three-stage incremental approach to the development of autonomous agents. We discuss some issues about the characteristics which differentiate reinforcement programs (RPs), and define the trainer as a particular kind of RP. We present a set of results obtained running experiments with a trainer which provides guidance to the AutonoMouse, our mouse-sized autonomous robot. 1 THE THREE STAGES OF THE DEVELOPMENTAL APPROACH Reinforcement learning (RL) problems are those problems in which an agent has the goal of learning how to maximize a scalar return which is functionally related to the agent's actions. Among the most studied and best known algorithms used by researchers in trying to solve RL problems there are Q-learning (Watkins, 1989), the adaptive heuristic critic (Barto, Sutton and Watkins, 1990), and the learning classifier system (Booker, Goldberg, Holland, 1989), which was used as the learning paradigm in the experiments of this paper. An issue we have been interested in recently (Dorigo, 1992; Dorigo and Colombetti, 1994) is the role that a trainer can have in helping a reinforcement learning system to develop into a working control system. In this paper we propose the three developmental phases (3DP) model, a methodology to develop a learning agent which is based on the idea that an agent should go through different stages of development during its life; we call these phases the baby phase, the young phase, and the adult phase (Fig. 1). In the baby phase the agent starts from a situation in which it has no a priori knowledge about how to act in the world. In this phase it is helped by a trainer which gives a positive or negative reward after each action performed by the agent. This phase lasts until the agent reaches a performance level defined by the designer of the system. In the young phase the trainer is disconnected and the learner goes on learning using solely environmental reinforcements. In this phase the learning agent uses delayed reinforcements to refine the knowledge acquired during the training phase. In principle, the learning agent could by-pass the baby phase, but this would make learning much harder, and therefore slower. As in the preceding case, this phase lasts until the agent reaches a designer defined performance level, which is higher than in the baby stage. In the adult phase the learning algorithm is switched off and the agent continues to perform using the acquired knowledge. The reason to switch off the learning algorithm is that in this way the computational burden d i m i n i s h e s , a s t h e a g e n t Monitoring Trainer Baby Learning Agent
منابع مشابه
Engineering goalkeeper behavior using an emotion learning method
This paper reports an approach toward engineering an evolving robot behavior for a role of a goalkeeper. In the course of our study of the problem, we developed a hand coded controller as well. The method of learning we used is emotion based self-reinforcement learning. The training of the learning robot is made by a non-didactic trainer which performs random shooting. We present also a typical...
متن کاملSurvey of effective factors on learning motivation of clinical students and suggesting the appropriate methods for reinforcement the learning motivation from the viewpoints of nursing and midwifery faculty, Tabriz University of Medical Sciences 2002.
Introduction. Motives are the powerful force in process of education– learning, so that the richest and best training plans and structured education are not effective if the lack of motivation existed. In spite of the fact that the success of teacher depends on the learning motivation of students, then it is necessary for teachers to know the effective methods for motivating the students and t...
متن کاملBayesian Reinforcement Learning with Behavioral Feedback
In the standard reinforcement learning setting, the agent learns optimal policy solely from state transitions and rewards from the environment. We consider an extended setting where a trainer additionally provides feedback on the actions executed by the agent. This requires appropriately incorporating the feedback, even when the feedback is not necessarily accurate. In this paper, we present a ...
متن کاملIn silico prediction of anticancer peptides by TRAINER tool
Cancer is one of the causes of death in the world. Several treatment methods exist against cancer cells such as radiotherapy and chemotherapy. Since traditional methods have side effects on normal cells and are expensive, identification and developing a new method to cancer therapy is very important. Antimicrobial peptides, present in a wide variety of organisms, such as plants, amphibians and ...
متن کاملMulticast Routing in Wireless Sensor Networks: A Distributed Reinforcement Learning Approach
Wireless Sensor Networks (WSNs) are consist of independent distributed sensors with storing, processing, sensing and communication capabilities to monitor physical or environmental conditions. There are number of challenges in WSNs because of limitation of battery power, communications, computation and storage space. In the recent years, computational intelligence approaches such as evolutionar...
متن کامل